Feature Engineering for Arabic Text Classification
نویسندگان
چکیده
منابع مشابه
Feature Engineering for Text Classification
Most research in text classification to date has used a “bag of words” representation in which each feature corresponds to a single word. This paper examines some alternative ways to represent text based on syntactic and semantic relationships between words (phrases, synonyms and hypernyms). We describe the new representations and try to justify our hypothesis that they could improve the perfor...
متن کاملText Summarization as Feature Selection for Arabic Text Classification
Text classification (TC) or text categorization task is assigning a document to one or more predefined classes or categories. A common problem in TC is the high number of terms or features in document(s) to be classified (the curse of dimensionality). This problem can be solved by selecting the most important terms. In this study, an automatic text summarization is used for feature selection. S...
متن کاملAn Efficient Feature Selection Method for Arabic Text Classification
This paper proposes an efficient, Chi-Square-based, feature selection method for Arabic text classification. In Data Mining, feature selection is a preprocessing step that can improve the classification performance. Although few works have studied the effect of feature selection methods on Arabic text classification, limited number of methods was compared. Furthermore, different datasets were u...
متن کاملArabic Language Text Classification Using Dependency Syntax-Based Feature Selection
We study the performance of Arabic text classification combining various techniques: (a) tfidf vs. dependency syntax, for feature selection and weighting; (b) class association rules vs. support vector machines, for classification. The Arabic text is used in two forms: rootified and lightly stemmed. The results we obtain show that lightly stemmed text leads to better performance than rootified ...
متن کاملOntology-guided feature engineering for clinical text classification
In this study we present novel feature engineering techniques that leverage the biomedical domain knowledge encoded in the Unified Medical Language System (UMLS) to improve machine-learning based clinical text classification. Critical steps in clinical text classification include identification of features and passages relevant to the classification task, and representation of clinical text to ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Engineering and Applied Sciences
سال: 2019
ISSN: 1816-949X
DOI: 10.36478/jeasci.2019.2292.2301